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Abstract 

Background: The major histocompatibility complex (MHC) is an important component of the vertebrate immune 
system and is frequently used to characterise adaptive variation in wild populations due to its co-evolution with 
pathogens. Passerine birds have an exceptionally diverse MHC with multiple gene copies and large numbers of 
alleles compared to other avian taxa. The Nesospiza bunting species complex (two species on Nightingale Island; 
one species with three sub-species on Inaccessible Island) represents a rapid adaptive radiation at a small, isolated 
archipelago, and is thus an excellent model for the study of adaptation and speciation. In this first study of MHC in 
Nesospiza buntings, we aim to characterize MHCIIB variation, determine the strength of selection acting at this gene 
region and assess the level of shared polymorphism between the Nesospiza species complex and its putative sister 
taxon, Rowettia goughensis, from Gough Island. 

Results: In total, 23 unique alleles were found in 14 Nesospiza and 2 R. goughensis individuals encoding at least 
four presumably functional loci and two pseudogenes. There was no evidence of ongoing selection on the peptide 
binding region (PBR). Of the 23 alleles, 15 were found on both the islands inhabited by Nesospiza species, and 
seven in both Nesospiza and Rowettia; indications of shared, ancestral polymorphism. A gene tree of Nesospiza 
MHCIIB alleles with several other passerine birds shows three highly supported Nesospiza-spec\f\c groups. All R. 
goughensis alleles were shared with Nesospiza, and these alleles were found in all three Nesospiza sequence groups 
in the gene tree, suggesting that most of the observed variation predates their phylogenetic split. 

Conclusions: Lack of evidence of selection on the PBR, together with shared polymorphism across the gene tree, 
suggests that population variation of MHCIIB among Nesospiza and Rowettia is due to ancestral polymorphism rather 
than local selective forces. Weak or no selection pressure could be attributed to low parasite load at these isolated 
Atlantic islands. The deep divergence between the highly supported Nesospiza-speafic sequence Groups 
2 and 3, and the clustering of Group 3 close to the distantly related passerines, provide strong support for preserved 
ancestral polymorphism, and present evidence of one of the rare cases of extensive ancestral polymorphism in birds. 



Background 

Understanding the principals that govern the generation 
and maintenance of functional genetic diversity is funda- 
mental to evolutionary biology. Large reductions in 
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population size, through bottleneck or founder events, 
result in a loss of genetic diversity [1] which may affect 
the ability of populations to adapt and survive in chan- 
ging environments [1,2]. However, genes of ecological 
adaptive importance may maintain variation through a 
severe reduction in population size through processes 
such as balancing selection [3,4]. The Major Histocom- 
patibility Complex (MHC) is such a functional locus, 
and has been extensively studied in both model and 
non-model species [5-7]. 
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The MHC is a multigene family involved in the verte- 
brate immune response [8], and is the most polymorphic 
set of genes known in vertebrates [9,10]. MHC variation 
is driven by an arms race between host and pathogen, 
where balancing selection maintains alleles in the popu- 
lation. An extensive repertoire of alleles enables the 
population to respond rapidly to changing or novel 
pathogens [11-13]. The highly variable peptide binding 
region (PBR) encoded by MHC class II ft exon 2 
(MHCIIft) ensures the binding of a large number of con- 
formationally different peptides [8]. The PBR of MHC 
molecules is involved in antigen recognition and as 
such may be under strong balancing selection when 
compared with the non-PBR sites [14]. Although the 
major driving force behind MHC diversity is host- 
pathogen co-evolution [11,15], sexual selection and 
selection against deleterious mutations also play a role 
in the maintenance of MHC variation [16-18]. 

Like many multi-gene families, MHC is governed by 
the birth-and-death model of evolution where new genes 
are generated through gene duplication. Some of these 
genes are maintained for long periods and even through 
population divergence events, while others lose function 
(pseudogenes) or are lost completely. MHC variation is 
also governed by gene conversion, where homologous 
recombination occurs between duplicated genes (paralo- 
gous genes), thus homogenising sequences between dif- 
ferent loci [6,19]. In passerine birds, the MHC is 
characterised by multiple gene copies, pseudogenes and 
long introns, and is exceptionally diverse and complex 
compared to other birds and vertebrate species [20-22]. 
Gene duplication events of MHC can be traced phylo- 
genetically in most lineages, because duplicated genes 
evolve independently. This can be seen in the phylogen- 
etic grouping of orthologous genes, rather than in a 
species-specific grouping [19,23,24]. Alternatively, recent 
duplication and concerted evolution of genes (through 
gene conversion) can result in species-specific clustering 
[6,22,25,26]. Due to the high rate of gene duplication 
and loss, and the confounding effect of gene conversion, 
it is notoriously difficult to re-construct avian MHC 
phylogenies [6]. 

Following a bottleneck or founder event, the genetic 
diversity of a population is reduced to only a subset of 
the original variation. As the population adapts to its 
new environment, the MHC allelic diversity will be 
made up of a combination of ancestral polymorphism 
and novel genetic variation. Trans-species evolution [27] 
or ancestral polymorphism [28] refers to the long-term 
maintenance of ancestral alleles in populations and spe- 
cies [29,30]. This process is governed by balancing selec- 
tion [31] and is seen when related species or subspecies 
share similar or the same MHC alleles despite local se- 
lection pressure. This pattern is common in mammals 



which do not often show concerted evolution, thus ortho- 
logous loci can be recognized between distantly related 
taxa such as mice and humans [24]. The high levels of 
concerted evolution in birds often make it difficult to dis- 
tinguish between orthologous and paralogous loci [25], 
although isolated cases have been reported e.g. [5,32]. 
Novel genetic diversity is introduced in populations either 
through dispersal or mutations. Mutational processes 
include gene duplication, point mutations and gene con- 
version e.g. [26,33]. Gene conversion is known to occur 
frequently in birds at the highly duplicated MHC genes 
[6,26,34,35]. The rate of gene conversion has been shown 
to be far greater than that of point mutations, thus may be 
a very important mechanism for generation of variation in 
bottlenecked populations [9,26]. 

In the present study, we assess MHC variation in the 
Nesospiza bunting species complex and its putative sister 
taxon, Rowettia goughensis. Evaluation of the MHC in 
Nesospiza and R. goughensis is interesting for several rea- 
sons. Nesospiza and R. goughensis are considered sister 
taxa and are presumed to have arrived at Tristan da 
Cunha and nearby Gough Island with the same 
colonization event [36]. Mitochondrial cytochrome b 
sequences are reciprocally monophyletic between island 
systems, and neutral microsatellite markers show sub- 
stantial genetic differentiation between species [37,38]. It 
is thus interesting to compare the MHC differentiation 
and allele sharing in Nesospiza and R. goughensis and 
determine the level of ancestral polymorphism between 
these species. Further, Nesospiza buntings have under- 
gone an ecological adaptive radiation in parallel on two 
islands [37]. Both Nightingale and Inaccessible islands 
are inhabited by large- and small-billed Nesospiza bunt- 
ings. The two species on Nightingale Island (N questi 
and N wilkinsi) co-occur with little, if any, interbreeding, 
probably due to the availability of two discrete seed sizes 
within a single habitat. Inaccessible Island has three 
lineages of N acunhae buntings: large-billed N a. dun- 
nei, and two colour morphs of the small-billed bunting, 
N a. fraseri and N a. acunhae [37,39]. Hybridisation 
occurs between all three forms across an ecotone on the 
eastern plateau of Inaccessible Island. This is probably due 
to a large variation of seed sizes occurring at low densities, 
which favours greater diversity in bill-sizes [37]. A single 
Nesospiza species inhabited the main island of Tristan, but 
was driven to extinction shortly after the arrival of 
humans at the archipelago. Genetic structure analysis 
based on neutral microsatellite markers show little or 
no hybridization between species on Nightingale, and 
strong differentiation between Nightingale Nesospiza 
and those on Inaccessible Island [37,38]. Despite ongoing 
hybridization on Inaccessible Island, a strong association 
has been found between bill morphology, habitat choice 
and genetic differentiation suggesting that both natural 
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and sexual selection may maintain differentiation [37,38]. 
Thus, it is possible that these selective pressures will result 
in species-specific patterns of MHC variation. However, 
an alternative hypothesis is that balancing selection has 
maintained most of the MHC variation across the species 
complex. Here we aim to 1) test for signatures of selection 
at the MHCIIft in Nesospiza buntings, and 2) investigate 
the extent of ancestral polymorphism between Nesospiza, 
its putative sister taxon Rowettia goughensis, and other 
passerine species [5,32,34,35,40,41]. 



Results 

PCR amplification success and nucleotide diversity 

In total, 508 sequences of expected length (159 bp) were 
obtained from 14 Nesospiza from the Tristan da Cunha 
archipelago (10 from Inaccessible and 4 from Nightin- 
gale) and two Rowettia goughensis from Gough Island 
(see Figure 1). Only sequences that were found in two or 
more individuals were included (396 sequences), and 
among these, 23 unique alleles were identified (Figure 2; 
Additional file 1 Table SI). Since the MHC complex 
contains several paralogous loci, alleles cannot be 



assigned to a particular locus. This prevents the use of 
the standard nomenclature of MHC alleles [42], and 
therefore alleles were named NesoOl - Neso23. No stop 
codons or frameshift mutations were present in any of 
these alleles, although one of the sequences (Neso02) 
contained an in-frame two codon insert, resulting in a 
165 bp sequence. BLAST analysis indicated high similar- 
ity (87-96%, with coverage of 80-98%) of 21 alleles 
(NesoOl- Neso21) to functional passerine MHCII alleles, 
whereas Neso22 and Neso23 had higher similarity (92- 
93%, with 98% coverage) to passerine pseudogenes. 

Each individual Nesospiza contained 3-7 unique pre- 
sumably functional (i.e. excluding known pseudogenes 
Neso22 and Neso23) alleles of MHCIIp (average ±SD: 
4.63 ±0.99). Assuming all loci to be heterozygous, the 
minimum number of MHCIIp loci that must be present in 
Nesospiza is four. This is similar to what has been 
observed in most passerine species (3-7 loci), with the 
exception of common yellowthroat (Geothlypis trichas) 
(20 loci), which has particularly high levels of gene dupli- 
cation [43]. A regression analysis performed to determine 
if the number of alleles sampled approached the max- 
imum for each individual showed that the number of 
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Figure 1 Geographic location, vegetational composition, and Nesospiza populations occurring at the Tristan da Cunha archipelago. 

Location of the Tristan da Cunha archipelago in the South Atlantic Ocean with the three main islands: Tristan, Inaccessible, and Nightingale. The 
vegetational composition, and occurring species and morpho-types of Nesospiza buntings are shown for Inaccessible and Nightingale islands 
(adapted from reference 37 and Google Maps). 
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Figure 2 Nesospiza MHClip gene tree. Neighbour-Joining tree showing the three well supported Nesospiza MHClip exon 2 allele clusters. Of 
the 23 alleles, 21 were found in the N. acunhae individuals on Inaccessible Island {Nesol -8, 10-13, 15-23), 14 in the N. wilkinsi and N. quest! on 
Nightingale {Nesol, 3, 4, 7, 9, 1 1, 13-15, 17, 18, 20, 22, 23), and 7 in R. goughensis {Neso5, 9, 13-15, 17, 23). Bootstrap support <70% are not shown. 



alleles did not plateau for 13 of the 16 individuals as the 
number of sequence clones increased (data not shown); 
thus, it is likely that more than four MHClip loci are 
present in Nesospiza. 

Of the 23 alleles, 21 were found in the N acunhae 
individuals on Inaccessible Island {Nesol-8, 10-13, 15- 
23), 14 in the N wilkinsi and N questi on Nightingale 
{Nesol, 3, 4, 7, 9, 11, 13-15, 17, 18, 20, 22, 23), and 7 in 
R. goughensis {Neso5, 9, 13-15, 17, 23). The nucleotide 
diversity (tt) of putatively functional alleles (i.e. excluding 
the pseodogenes, Neso22 and Neso23) was 0.11 in N 
acunhae on Inaccessible Island (data from 19 alleles in 
10 individuals), 0.11 in N wilkinsi on Nightingale (data 
from 8 alleles in 2 individuals), and 0.07 in N question 
Nightingale (data from 7 alleles in 2 individuals). The 



nucleotide diversity (0.04) found in R. goughensis (data 
from the 6 alleles in 2 individuals). 

Selection and recombination 

The PBR was identified after alignment with the human 
HLA-DRB*04 amino acid sequence. Traditional selection 
statistics did not uncover any statistically significant se- 
lection patterns (Tajimas D = 0.61, p>0.10; Fu & Lis 
D* = 0.30, p>0.10; Fu & Lis F* = 0.46, p>0.10). The 
sampled populations showed no evidence of selection at 
the either the PBR or non-PBR regions (Table 1). Null 
models were supported by likelihood ratio tests, with 
only one site likely to be under positive selection 
(Table 2). Tests for recombination in RDP3 Beta 27 
revealed no significant recombination events. 
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Table 1 Proportion of non-synonymous (d N ) and synonymous (d s ) substitutions in MHCIIp sequences of Nesospiza and 
other passerines 



PBR Non-PBR 



Comparisons 


N 


d N (±SE) 


d s (±SE) 


d N /d s 


z-test 


d N (±SE) 


d s (±SE) 


d N /d s 


z-test 


Brown et al. [44] 


NesoOI-23 


23 


0.377 (±0.146) 


0.236 (±0.161) 


1.60 


n.s. (0.54) 


0.133 (±0.029) 


0.1 14 (±0.051) 


1.17 


n.s. 


(0.74) 


Neso01-21 t 


21 


0.372 (±0.142) 


0.234 (±0.174) 


1.59 


n.s. (0.59) 


0.090 (±0.022) 


0.097 (±0.047) 


0.93 


n.s. 


(0.90) 


Groupl 


2 


0.031 (±0.031) 


0.000 (±0.000) 


n/a 


n.s. (0.33) 


0.011 (±0.011) 


0.039 (±0.040) 


0.28 


n.s. 


(0.51) 


Group2 


15 


0.009 (±0.006) 


0.000 (±0.000) 


n/a 


n.s. (0.16) 


0.028 (±0.007) 


0.042 (±0.030) 


0.67 


n.s. 


(0.68) 


Group3 


6 


0.073 (±0.033) 


0.036 (±0.049) 


2.03 


n.s. (0.51) 


0.023 (±0.010) 


0.014 (±0.014) 


1.64 


n.s. 


(0.62) 


Inaccessible + 


19 


0.361 (±0.133) 


0.222 (±0.149) 


1.63 


n.s. (0.53) 


0.135 (±0.026) 


0.115 (±0.050) 


1.17 


n.s. 


(0.74) 


Nightingale t 


12 


0.420 (±0.151) 


0.281 (±0.192) 


1.49 


n.s. (0.60) 


0.166 (±0.034) 


0.118 (±0.055) 


1.41 


n.s. 


(0.43) 


Tristan da Cunha # 


21 


0.377 (±0.146) 


0.236 (±0.161) 


1.60 


n.s. (0.54) 


0.133 (±0.029) 


0.114 (±0.051) 


1.17 


n.s. 


(0.74) 


Rowettia goughensis 


6 


0.486 (±0.188) 


0.351 (±0.246) 


2.59 


n.s. (0.70) 


0.194 (±0.040) 


0.135 (±0.063) 


1.44 


n.s. 


(0.40) 


Tong et al. [45] 


NesoOI-23 


23 


0.230 (±0.090) 


0.125 (±0.159) 


1.84 


n.s. (0.61) 


0.174 (±0.033) 


0.143 (±0.051) 


1.22 


n.s. 


(0.61) 


NesoOI-21 t 


21 


0.100 (±0.054) 


0.090 (±0.147) 


0.11 


n.s. (0.96) 


0.151 (±0.032) 


0.135 (±0.055) 


1.12 


n.s. 


(0.79) 


Groupl 


2 


0.053 (±0.052) 


0.000 (±0.000) 


n/a 


n.s. (0.33) 


0.053 (±0.053) 


0.000 (±0.000) 


n/a 


n.s. 


(0.32) 


Group2 


15 


0.018 (±0.019) 


0.000 (±0.000) 


n/a 


n.s. (0.38) 


0.024 (±0.006) 


0.033 (±0.025) 


0.73 


n.s. 


(0.73) 


Group3 


6 


0.033 (±0.022) 


0.000 (±0.000) 


n/a 


n.s. (0.14) 


0.036 (±0.013) 


0.024 (±0.017) 


1.50 


n.s. 


(0.58) 


Inaccessible # 


19 


0.240 (±0.091) 


0.126 (±0.165) 


1.90 


n.s. (0.57) 


0.171 (±0.032) 


0.139 (±0.049) 


1.13 


n.s. 


(0.59) 


Nightingale t 


12 


0.306 (±0.124) 


0.152 (±0.189) 


2.01 


n.s. (0.54) 


0.204 (±0.038) 


0.152 (±0.058) 


1.34 


n.s. 


(0.44) 


Tristan da Cunha =t= 


21 


0.230 (±0.090) 


0.125 (±0.159) 


1.84 


n.s. (0.61) 


0.174 (±0.033) 


0.143 (±0.051) 


1.22 


n.s. 


(0.61) 


Rowettia goughensis 


6 


0.334 (±0.123) 


0.163 (±0.189) 


2.05 


n.s. (0.52) 


0.238 (±0.045) 


0.186 (±0.069) 


1.28 


n.s. 


(0.49) 


New Zealand robin 


41 


0.339 (±0.078) 


0.094 (±0.059) 


3.6 


<0.005 


0.076 (±0.019) 


0.039 (±0.013) 


1.95 


n.s. 




Chatham Island robin 


4 


0.373 (±0.086) 


0.135 (±0.078) 


2.76 


<0.05 


0.099 (±0.024) 


0.020 (±0.014) 


5.05 


<0.005 


Hawaiian honeycreepers 


51 


0.341 (±0.103) 


0.076 (±0.095) 


4.49 


<0.001 


0.121 (±0.038) 


0.092 (±0.053) 


1.32 


n.s. 




Common yellowthroat 


39 


0.608 (±0.120) 


0.211 (±0.111) 


2.88 


<0.05 


0.135 (±0.034) 


0.137 (±0.034) 


0.99 


n.s. 




House sparrow* 


12 


0.470 (±0.109) 


0.123 (±0.095) 


3.82 


<0.0001 


0.203 (±0.047) 


0.200 (±0.051) 


1.02 


n.s. 





References: New Zealand and Chatham Island robins [34,35], Hawaiian honeycreepers [46], common yellowthroat [43];* Values were calculated from GenBank 
sequences, * Does not include the putative pseudogenes Neso22 and Neso23, N = Number of MHC sequences; Brown et al. [44]: PBR= 14 amino acids, 
non-PBR = 41 amino acids; Tong et al. [45]: PBR = 9 amino acids, non = PBR = 46 amino acids. 



Phylogenetic analysis 

A consensus Neighbour-Joining tree of the 23 Nesospiza 
alleles showed three highly supported groups, called 
Nesospiza Group 1-3 (Figure 2). The same three Nesos- 
piza groups were highly supported within genealogies for 
passerine MHCIIp reconstructed from exon 2 sequences 
using Bayesian inference (Figure 3). Group 1, containing 



the Neso22 and Neso23, and a red-winged blackbird 
pseudogene (Agelaius phoeniceus; APAF030990), form a 
highly supported, diverged cluster. A second red-winged 
blackbird pseudogene (APAF030994) and a vegetarian 
finch (Platyspiza crassirostris) pseudogene (PCAY064469), 
however, group with other presumably functional passer- 
ine MHC sequences. 



Table 2 Parameter estimates and results from four selection models as implemented in CODE ML 



Model 


Log-likelihood 


Parameter estimates 


Positively selected sites 


Mia (nearly neutral) 


-576.502 


p 0 = 0.385, pi = 0.61 5, ooO = 0.038, 


Not allowed 


oo1 =1.000 


M2a (positive selection) 


-574.685 


p 0 = 0.311, p1 =0.628, p2 = 0.061 , 


None 


ooO = 0.000, oo 1 = 1 .000 , oo2 = 4.847 


M7 (beta) 


-576.474 


p = 0.033, q = 0.01 8 


Not allowed 


M8 (beta and omega) 


-574.720 


p 0 = 0.941, pi =0.059, p = 0.028, 


37 N 


q = 0.01 5, 00 = 4.612 
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Figure 3 Passerine MHClip gene tree. Gene tree of the MHCII(3 exon 2 sequences of Nesospiza and sequences of several other passerine 
species obtained from GenBank. A Bayesian analysis of 159 bp of sequences, with independent mutational models applied to each codon 
position (Position 1: TIM3ef+ 1 + G; Position 2: WM + G; Position 3: TPM2uf + G). Bayesian posterior probabilities are indicated at the nodes; 
values < 0.95 are not shown. 
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Group 2 (Neso01-13, 20-21) is distinct and appears to 
be a well-supported cluster of presumably functional 
MHC alleles unique to Nesospiza and R. goughensis. 
Group 3 (Nesol4-19), which also contains sequences 
shared by Nesospiza and R. goughensis, is well supported, 
but clusters more closely with sequences from the dis- 
tantly related common yellowthroat, New Zealand robin 
(Petroica australis), Chatham Island robin (Petroica 
traverse), Florida scrub jay (Aphelocoma coerulescens) 
and vegetarian finch. Of the other passerine species, zebra 
finch, Florida scrub jay, and little greenbul (Andropadus 
virens; with the exception of one sample) cluster by spe- 
cies or, in the case of New Zealand and Chatham Island 
robins (Petroica australis), with sister species. Sequences 
of the great reed warbler (Acrocephalus arundinaceus) are 
scattered throughout the phylogeny as small groups or 
single alleles, apart from one supported group divergent 
from most other passerine sequences. The sequences 
of several passerines, namely house finch (Carpodacus 
mexicanus), vegetarian finch, red-winged blackbird, 
and common yellowthroat, cluster with those of other 
species throughout the phylogeny. 

Discussion 

This study describes 23 MHCIIft alleles representing at 
least four functional loci and two pseudogenes in the 
Nesospiza bunting species complex. Many MHCIIp alleles 
were shared between Nesospiza taxa as well as between 
Nesospiza and its putative sister taxon R. goughensis. This 
pattern of ancestral polymorphism suggests that the 
observed gene duplications occurred prior to the phylogen- 
etic split of the species, and subsequent unusually low se- 
lective pressure at the loci has prevented allelic divergence 
between species. The MHC nuclear genetic diversity in 
Nesospiza on Inaccessible (tt = 0.11) was comparable to 
that of outbred passerine species (e.g. 0.15 in Luscinia sve- 
cica; [5]), and despite the low sample size for Nightingale, 
allele numbers and nucleotide diversity were higher than in 
the severely bottlenecked Chatham Island robin population 
(0.05) [35]. We have screened 14 Nesospiza individuals for 
MHC variation, which is similar to some previous Passer- 
ine MHC studies using cloning and sequencing e.g. 
[34,35,43,47]. However, because larger sample sizes would 
have been necessary to cover the variation of each popula- 
tion sufficiently, we will not discuss population-level MHC 
variation further. 

Patterns of both ancestral polymorphism and concerted 
evolution among Nesospiza and Rowettia populations are 
evident from our results. Ancestral polymorphism, found 
here for Nesospiza and R. goughensis, as well as in other spe- 
cies (e.g. great reed warbler, house finch, vegetarian finch, 
red-winged blackbird and common yellowthroat), can be 
seen in the sharing of the same or similar alleles between 
species (Figures 2 and 3). Of the 23 Nesospiza alleles, 15 



were found in species from both islands. All seven alleles oc- 
curring in R. goughensis are shared with Nesospiza {NesoS, 9, 
13-15, 17, 23) and these alleles are found in all three Nesos- 
piza groups in the gene tree (Figures 2 and 3). The esti- 
mated minimum number of putatively functional gene 
copies in Nesospiza (i.e. 4 loci) suggests that the three Nesos- 
piza allele groups are not necessarily locus -specific, despite 
their divergent clustering. Group 3 may represent a single 
locus, since only one or two alleles from this cluster occur 
in each individual. However, this is not the case for R. gough- 
ensis, where three of these alleles occur in one individual. 
Two highly supported clusters are seen within Group 2 
(Figure 3), which is also the cluster containing the most 
alleles, suggesting that this cluster is likely to represent 
more than one gene copy. A likely explanation for the clus- 
tering of alleles from different gene loci is the genetic 
homogenization caused by gene duplication events with 
subsequent gene conversion. 

The highly supported branches of sequences forming 
Groups 2 and 3 in the gene tree contain only Nesospiza and 
R. goughensis alleles. Although several species were included 
due to the similarity between their MHCIIp alleles and those 
of Nesospiza, the observed divergent clustering of Group 2 
sequences could be explained by a lack of closely related 
species in the analysis. Alternatively, the species-specific 
clustering of Nesospiza may be attributed to their long diver- 
gence time from the other passerines sampled [48]. The 
deep divergence of Groups 2 and 3, and the clustering of 
Group 3 close to the distantly related species of common 
yellowthroat, New Zealand robins, Florida scrub jay, and 
vegetarian finch, however, provide strong support for pre- 
served ancestral polymorphism. These patterns suggest that 
extant MHC variation in Nesospiza and R. goughensis can be 
explained by shared ancestral polymorphism during colon- 
isation which has since been maintained. It is possible that 
the additional variation has been generated by gene conver- 
sion events, which is the most likely method of generating 
variation from the few alleles remaining in a population fol- 
lowing a population bottleneck [26]. 

Amino acid sequences are more similar between Groups 1 
and 3 (Figure 4). This could either represent evidence of re- 
combination with the pseudogenes, producing a new group 
of functional sequences, or perhaps more likely indicate that 
the pseudogenes resulted from gene duplication events of 
Group 3 sequences. Copying errors during gene duplication 
and recombination events may result in non-functional 
genes (pseudogenes) and the subsequent lack of functional 
constraint on evolutionary processes (such as mutation) act- 
ing on the pseudogenes result in rapid sequence divergence 
[49]. This is evidently the case for the two presumably non- 
functional alleles, Neso22 and Neso23, which form a well 
supported group with a red-winged blackbird pseudogene, 
clustered sister to all the functional passerine sequences. 
However, some pseudogenes (e.g. red-winged blackbird 
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Figure 4 Assignment of Nesospiza MHClip peptide binding sites. Alignment of MHCII(3 exon 2 amino acid sequences of NesoOl - 23 
indicating amino acid differences between Groups 1-3 sequences. Amino acid identity is shown by "." and an alignment gap by Alignment 
with human HLA-DRB*04 (GenBank accession: NM_021983) was used to assign peptide binding sites (*) according to Brown et al. [44] and Tong 
etal. [45]. 



APAF030994 and vegetarian finch PCAY064439) do not 
show evidence of rapid divergence (Figure 3), perhaps due to 
ongoing recombination with functional genes that is leading 
to sequence conservation. Alternatively, there may have been 
insufficient time for the genes to become highly diverged 
since they became non-functional. 

Selection tests showed no consistent evidences of balan- 
cing or positive selection at the PBR or non-PBR regions 
of MHCIIp exon 2 in Nesospiza and Rowettia. The short 
fragment length of our sequences excludes some of the 
PBR sites, and therefore there is a chance that some sites 
that may be under selection were excluded from the ana- 
lyses. However, selection tests were done according to two 
different PBR characterizations [44,45], and tested on the 
entire data set as well as all species individually, and the 
three clusters independently. Ratios of d N /d s were non- 
significant in all cases (Table 1), and additional selection 
tests showed weak evidence of selection with only one site 
likely to be under positive selection (Table 2). New MHC 
variation can be generated by point mutations or through 
recombination between alleles, giving rise to a new allele 
[26,33]. The latter process, known as gene conversion, has 
been documented in some natural avian populations 
[22,25,26] and has been suggested to be essential in gener- 
ating genetic variation at MHC after a bottleneck [26]. 
During gene conversion events, synonymous substitutions 
may hitchhike with non-synonymous variation [26] and 
this may be a reason why d N /d s ration tests fail to detect 
positive selection. We found, however, no evidence of re- 
combination in our data, but recombination can be diffi- 
cult to verify with short sequences. 

Despite the lack of significant evidence for selection, 
ratios of d N /d s > 1.0 that we observe in Rowettia and all 



Nesospiza populations indicate that the loci are under 
weak balancing selection, or perhaps more likely, that 
ancestral balancing selection acted on the loci before 
colonisation of the islands. Lack of strong positive selec- 
tion may reflect a decreased pathogen load in both 
Nesospiza and R. goughensis. Passerines generally are less 
parasitised by lice and ectoparasites than other avian 
orders e.g. [50]. This is particularly true of small popula- 
tions on isolated oceanic islands (R Palma pers. comm.). 
Myrsidea lice occur at extremely low prevalence (6.4%) 
across 12 species of Darwin's finches at the Galapagos 
Islands [50]. On Tristan da Cunha and Gough Island, 
different louse species (order Phthiraptera) have been 
found on 20 bird species, including the Tristan thrush 
(Nesocichla eremita) [51], yet careful inspection of 
Nesospiza buntings yielded no lice, with hippoboscid 
flies and feather mites the only ectoparasites (PG Ryan 
unpubl. data). The absence of parasites could be due to 
an uninfected founding population ("missing the boat") 
[52], or subsequent extinction from the host after colon- 
isation. The high level of ancestral polymorphism be- 
tween R. goughensis and Nesospiza suggest that the 
former is more likely, where a single uninfected found- 
ing population colonization both Tristan da Cunha and 
Gough Island. 

Some shortcomings of the cloning and sequencing 
method employed in the study may result in underestima- 
tion of MHC variation. Firstly, the large number of gene 
copies and the high level of convergence between loci make 
it difficult to amplify a single MHC locus at a time. Thus, 
most MHC studies on non-model vertebrates amplify alleles 
from multiple gene copies simultaneously. This increases 
the risk of chimera formation during the PCR, which in turn 
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leads to overestimation of levels of gene recombination [53]. 
In addition, PCR products are prone to point mutations e.g. 
[54], although these are relatively easy to detect since muta- 
tion rates are relatively low and are unlikely to occur in 
more than one sequence [55,56]. In this study, we compen- 
sate for these problems by only accepting alleles that occur 
in at least two individuals e.g. [57,58]. Secondly, the amplifi- 
cation of a multi-gene family is necessarily problematic since 
not all loci and not all alleles at a locus will be detected 
using a single primers set. The primers employed in this 
study were designed for non locus-specific amplification of 
exon 2 of MHCIIft in zebra finch (Taeniopygia guttata) [59] 
and have been successfully employed in other passerine 
MHC studies (H Westerdahl pers. comm.). A regression 
analysis of the number of clones sequenced per individual 
found that more individuals and sequences will be necessary 
to estimate true MHC variation per individual. Finally, 
sequences were obtained for only half of the variable 
MHCIIp exon 2 gene. Although not all the variation has 
been analysed in this study, this is often the case with such 
complex multi-gene systems [58] and does not preclude our 
finding of ancestral polymorphism between species and 
within the Nesospiza species complex. More comprehensive 
studies of population level variation of MHC would require 
that more individuals and sequences were analysed. How- 
ever, the present study focuses on selection and levels of 
shared polymorphism, and for such analyses the present 
data is sufficient. 

Conclusions 

The extent of shared alleles and ancestral polymorphism be- 
tween Nesospiza and R goughensis suggests that both origi- 
nated from the same colonization wave. We find that 
similar or the same alleles are maintained between species 
due to the recent species divergence and low levels of (local) 
selection acting on PBR. The additional variation found 
within the Nesospiza species complex may be due to gene 
conversion, which is likely the most prominent mechanism 
for generating new variation after a bottleneck event [26]. 
The extant genetic variation is not likely to change rapidly, 
unless there is a drastic geographic or environmental change 
leading to strong selection at the MHC. One such situation 
would be the introduction of pathogens, since populations 
with low MHC diversity are often more susceptible to novel 
pathogens [35,60]. In the absence of strong selection, MHC 
is expected to diverge over time between islands and popu- 
lations due to drift, with the generation of new haplotypes 
through point mutations or gene conversion. Ongoing gene 
flow between populations and subspecies on Inaccessible Is- 
land can maintain genetic variation to some extent. The po- 
tential role of MHC dependent sexual selection [22,61] to 
drive divergence between populations even further remains 
open to study, and would require wider sampling over the 



entire geographic range to cover the details of geographic- 
and species-specific variation. 

Methods 

Sampling 

Buntings were mist-netted or caught with hand nets at 
Inaccessible, Nightingale and Gough Islands during 
September 1999 - February 2000, with additional sam- 
ples from Inaccessible Island collected in September - 
November 2004 [37,38]. No extant Nesospiza species 
occur on Tristan Island. Brachial vein blood samples 
were collected and stored in EDTA or lysis buffer. Two 
to three individuals were chosen to represent each popu- 
lation (Figure 1; Inaccessible: 3 AT. a. acunhae, 2N a.fra- 
seri, 2N a. dunnei, 3 AT. a. hybrid; Nightingale: 2N. 
questi, 2N wilkinsi; Gough Island: 2 R. goughensis), 

DNA extraction and amplification 

DNA was extracted from whole blood by standard phe- 
nokchloroform methods [Sambrook]. The primers 2zffwl 
(5' TGT CAC TTC AYK AAC GGC ACG GAG 3') and 
2zfrvl (5' GTA GTG TGC CGG CAG TAC GTG TC 3'), 
previously designed for the zebra finch (Taeniopygia gut- 
tata) [59], were used to amplify 159 bp of MHCIIft exon 
2. These primers are not locus -specific and amplify exon 
2 of multiple copies of the MHCIIE gene. Amplifications 
were performed in 10 ul volumes, each containing 5 ul 
QIAGEN Multiplex PCR Master Mix, 10 pM of each pri- 
mer, and 10 ng of template DNA. PCR cycling conditions 
involved an initial denaturing step of 15 minutes at 95C, 
followed by 35 cycles of 30 seconds at 94C, 1 minute 30 
seconds at 64C and 1 minute 30 seconds at 72C. 

Cloning and sequencing 

PCR products of all individuals were cloned using the 
TOPO TA Cloning® kit (Invitrogen). Vectors (pCR® 
2.1-TOPO®) with inserted PCR product were used to 
transform chemically competent Escherichia coli cells 
(OneShot®), according to the manufacturers instruc- 
tions. Transformed cells were cultured on S.O.C 
medium (Invitrogen) for one hour in a shaking incuba- 
tor at 37C and then incubated overnight at 37C on LB- 
medium supplemented with 50 ug/ml Ampicillin and 
50 ul of X-gal (40 mg/ml). For each sample 30 positive 
colonies were picked with a sterile toothpick, diluted in 
100 ul Sabax water (Adcock Ingram) and used directly 
as DNA template for PCR. Amplification reactions con- 
tained 2 ul QIAGEN Multiplex Master Mix, 10 pM each 
of M13 forward and M13 reverse primers (included in 
the kit), and 2 ul of the colony diluted in Sabax water. 
The same PCR cycling conditions were used as before 
(see above). All clones were sequenced in both direc- 
tions on an ABI Prism 3100 capillary sequencer (Applied 
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Biosystems). A total of 12 - 29 clones were successfully 
sequenced per individual (average = 22.88). 

Data analysis 

Nucleotide sequences were edited and aligned using CLC 
Main Workbench 5.0.2 (CLC Bio). To avoid including 
false haplotypes due to artefacts arising during PCR (e.g. 
recombinant chimeric sequences), sequences were only 
accepted if they were present in two or more individuals 
[56,62] (396 of 508 sequences were accepted and these 
represented 23 different alleles; Additional file 1 Table SI). 
Due to the large number of sequences excluded with this 
stringent method, we followed Anmarkrud et al [5] sug- 
gestion to identify additional true alleles and evaluated 
whether the excluded sequences were >1.5% (~3 bp) dif- 
ferent from any of the sequences that were identified as 
possible alleles. Only two of the excluded sequences dif- 
fered with >1.5% and since so few alleles would not affect 
the results we decided not to include them in the analyses. 

The nucleotide diversity (tt) was calculated using DnaSP 
5.0 [63]. Sequences were verified as MHC alleles using the 
BLASTN 2.2.24 algorithm [64] available through the Na- 
tional Center for Biotechnology Information (NCBI). Of 
the 23 alleles identified, 21 (NesoOl - 21) showed high 
identity (87-96%, with coverage of 80-98%) to known pas- 
serine MHCIIft coding genes, and two alleles (Neso22 and 
Neso23) showed high identity (92-93%, with 98% cover- 
age) with passerine pseudogenes (Figure 3). This suggests 
that Neso22 and Neso23 are non-functional, thus they 
were excluded from the selection tests. 

A regression analysis was performed to determine if the 
number of sequences obtained for each individual effect- 
ively sampled the total number of alleles. For each individ- 
ual, a random subset of the alleles obtained was sampled 
and the number of alleles in the subset counted. This was 
repeated 100 times each for a subset of 5, 10, 15, 20 and 
25 (restricted by the number of sequences obtained for 
each individual). As sampling approaches the maximum 
number of alleles in the population, the number of alleles 
found in increasing subset sizes will plateau. 

Nucleotide positions associated with the PBR were 
assigned according to the PBR regions determined for the 
human antigen binding region by two different studies 
[44,45]. Selection was tested using the ratio of nonsynon- 
ymous (d N ) to synonymous (d s ) substitutions (d N /d s = co). 
Under strict neutrality d N = d s , while regions under balan- 
cing selection are expected to undergo more nonsynon- 
ymous substitutions and regions under directional selection 
more synonymous substitutions. The parameter co was cal- 
culated in MEGA 4 [65] using the method of Nei and 
Gojobori [66] with Jukes Cantor corrections and 1000 
bootstrap replicates. A z-test [66] was used to determine 
the probability of selection by comparing the selection 
parameter, co, against a null hypothesis of strict neutrality 



(d N = d s ). Standard selection tests (Tajimas D, Fu & Lis F* 
and Fu & Lis D*) were calculated in DnaSP 5 [63]. Substi- 
tution rates, co, and the probability of positive selection on 
PBR and non-PBR regions, were compared to results from 
New Zealand and Chatham Island robins (Petroica australis 
and Petroica traverse) [34,35], Hawaiian honeycreepers 
{Drepanidinae) [46], common yellowthroat (Geothlypis 
trichas) [43], and house sparrow {Passer domesticus; 
values calculated using sequences from GenBank). 

In a second test of selection, the maximum likelihood 
method implemented in CODEML in the Phylogenetic 
Analysis by Maximum Likelihood package (PAML 3.14) 
[67,68], was used to identify the sites under selection. 
Likelihood ratio tests in CODEML were used to test 
neutral models and models of selection. In a first com- 
parison, a neutral model Mia (co 0 < 1, u>i = 1) was tested 
against M2a, a model for positive selection (co 2 > 1). 
Model Mia assumes that sites are either conserved or 
under purifying selection (i.e. removed from the popula- 
tion) ((o 0 < 1), or selectively neutral (c^ = 1). Model M2a 
considers a third class of sites where sites may be under 
positive selection (co 2 > 1). A second comparison tested a 
neutral model M7 (0 < co < 1) against a model for positive 
selection, M8 (0 < co < 1, co > 1). Model M7 is based on a 
(3 distribution and estimates co as a value between 0 and 
1. In M8, co is estimated directly from the data for one 
class of sites which allows for co > 1. Both these tests are 
used routinely to identify sites under selection [69]. The 
best-fit model was determined using a likelihood ratio 
test for each model comparison, thus the likelihood of 
positive selection could be evaluated [70]. The difference 
in likelihood values of the null model (Mia, M7) and the 
alternative model (M2a, M8) was compared with the x 2 
distribution. Degrees of freedom were calculated as the 
difference in the number of parameters for each test. 
The Bayes Empirical Bayes method, implemented in 
CODEML, was used to calculate the posterior probabil- 
ity for each site class for the M2a and M8 models. A site 
is likely to be under positive selection when the poster- 
ior mean of co > 1 [68]. 

To determine the phylogenetic relationship between the 
23 Nesospiza alleles a Neighbour- Joining (NJ) tree was con- 
structed in MEGA 4 [65] assuming homogenous substitu- 
tion patterns among lineages and uniform rates among sites. 
A consensus tree was computed from 10 000 bootstrap 
replicates in MEGA 4 [65] using a 75% consensus cut-off 
value. All subsequent phylogenetic analyses were conducted 
in MrBayes v 3.1.2 [70]. A concatenated data set comprising 
MHCIIp sequences from several passerines obtained from 
GenBank (Figure 3) was analysed with all Nesospiza alleles 
{NesoOl - Neso23). The passerine species most closely 
related to Nesospiza, chosen as the top ten hits for each 
Nesospiza allele using BLAST, and several other passerine 
species (chosen to represent passerine diversity), were used 



Jansen van Rensburg et al. BMC Evolutionary Biology 2012, 12:143 
http://www.biomedcentral.com/1471 -21 48/1 2/1 43 



Page 11 of 13 



for the phylogenetic analyses. Sequences were only included 
if there was sequence alignment of more than 100 bp, thus 
some species (e.g. Poephila acuticauda) identified to be in 
the top ten closest matches to one of the Nesospiza alleles 
were not included. This cut-off was made to ensure a robust 
result from the phylogenetic analysis. 

The best model for nucleotide substitution was chosen 
using the Akaike Information Criterion (AIC) [71] as deter- 
mined by jModelTest [72,73] for each codon position inde- 
pendently (Position 1: TIM3ef + 1 + G; Position 2: TVM + G; 
Position 3: TPM2uf + G). Divergent zebra finch sequences 
were chosen as a root for passerine MHCIIp [60]. MrBayes 
was run for 3 million generations with four incrementally 
heated chains. Trees were sampled every 3 000 generations, 
with a 10% burn-in. A consensus tree and posterior prob- 
abilities were calculated from the sampled trees. The average 
standard deviation of split frequencies between two simul- 
taneous runs was monitored to confirm convergence. 

The RDP3 Beta 27 [74] package was used to test for 
signatures of recombination using multiple algorithms 
simultaneously: RDP [75], GENECONV [76], BootScan 
[77], MaxChi [78], Chimaera [79], and 3Seq [80]. The 
default settings were used, and the significance level was 
set to 0.05. Bonferroni corrections were applied for mul- 
tiple comparisons [81]. 

Additional material 

GenBank accession numbers of non-Nesospiza sequences 
used in the present study: L42334 - L42335, U23968 - 
U23969, U23967, U23970, U23971, AJ404371 - AJ404 
376, U24405, AY437900 - AY437912, AY428561 - AY42 
8568, AY258333 - AY248335, AY428569, U23958 - U2 
3966, U23972, U23973, U23975, XM_002192161, XM_0 
02193356, XM_002196138, XM_002 197722, XM_00219 
8130, XM_002198161, XM_002 199709, XM_ 0022002 
57, AF165156 - AF165157, AF165159, Z74424 - Z74428, 
AY064425, AY064439, AY064451, GQ247601 - GQ2476 
06, GQ247608 - GQ247609, GQ247613 - GQ247614, GQ 
247616 - GQ247622, GU390288 - GU390291, AY518171 - 
AY518183, AY583092 - AY583094. 

Additional file 



Additional file 1: Table SI. List of 23 major histocompatibility complex 
class II S (MHCIIB) exon 2 Nesospiza bunting sequences used in the 
present study. 
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